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Abstract —Empowered by today’s rich tools for media gen¬ 
eration and distribution, and the convenient Internet access, 
crowdsourced streaming generalizes the single-source streaming 
paradigm by including massive contributors for a video channel. 
It calls a joint optimization along the path from crowdsourcers, 
through streaming servers, to the end-users to minimize the 
overall latency. The dynamics of the video sources, together 
with the globalized request demands and the high computation 
demand from each sourcer, make crowdsourced live streaming 
challenging even with powerful support from modern cloud 
computing. In this paper, we present a generic framework that 
facilitates a cost-effective cloud service for crowdsourced live 
streaming. Through adaptively leasing, the cloud servers can be 
provisioned in a fine granularity to accommodate geo-distributed 
video crowdsourcers. We present an optimal solution to deal with 
service migration among cloud instances of diverse lease prices. 
It also addresses the location impact to the streaming quality. 
To understand the performance of the proposed strategies in 
the realworld, we have built a prototype system running over 
the planetlab and the Amazon/Microsoft Cloud. Our extensive 
experiments demonstrate that the effectiveness of our solution in 
terms of deployment cost and streaming quality. 

I. Introduction 

The Internet has witnessed a significant increase in the 
popularity of media streaming with multi-source channels. In 
traditional video broadcast, the content of a channel generally 
comes from a single source, though it could be replicated and 
then streamed from different servers in a Content Distribution 
Network (CDN). A multi-source system, however, not only 
serves massive audience worldwide, but its content also comes 
from multiple contributing sources. For example, since Feb. 
17, 2012, NASA Television’s Public and Media channels 
began to transmit their respective content in high definition 
(HD), with live feeds from such space centers as the NASA 
Headquarters, the Johnson Space Center, and the Goddard 
Space Flight Center 1 . With their respective content sources, 
they collectively serve the users interested in the stories and the 
latest news from NASA. In the very recent 2014 Sochi Winter 
Olympics, NBC had a total of 41 live feeds distributed both 
in Solchi and in the USA 2 , and in FIFA World Cup 2014, 
when a goal was scored, CBC synchronized the live scenes 
of the cheering fans in the public squares from a number of 
cities worldwide in its live streaming channel. The evolution is 
driven further by today’s advanced mobile/tablet devices that 
can readily capture high quality video anywhere and anytime 

1 http: // www. nasa. gov/multimedia/index. html 

2 http: //w w w. istreamplanet. com/sochi- 2014/ 


(e.g. iPhone 6 supports both 60 fps 1080p video recording, 
and 720 fps slow-motion recording for 720p videos), and such 
mainstream video sharing platforms as YouTube and Veedme 
have already enabled multi-party collaborative video content 
production. All these together are shifting the video service 
paradigm from the conventional single source, to multi-source, 
to many source, and now toward crowd source , in which the 
available video sources for the content of interest become 
highly diverse and scalable. 

Global streaming imposes high demand on end device 
capabilities and network connections. The situation is further 
complicated in a crowdsourced streaming system. First, crowd¬ 
sourced videos are geo-distributed: they come from all over the 
world, and then spread all over the world. Not only the scale of 
the consumers is enormous, but also is that of the contributors; 
Second, the crowdsourcers are often much more dynamic than 
dedicated content providers, as they can start or terminate 
a video contribution as their own will. This is particularly 
true when non-professionals using their smartphones/tablets 
for video production; Third, for collective content production, 
massive server capacity is necessary to deal with online 
video synchronization, processing, and transcoding for highly 
heterogeneous video contributors and consumers. For example, 
Twitch TV, the world’s leading video platform and community 
for gamers 3 , allows any of its users to broadcast their live 
streaming videos online through their PCs or PS 3/XBOX 
consoles. It attracts over 44 million visitors per month, and 
every second its servers are loaded with thousands of live 
channels. For such a large system, significant effort is needed 
to collect the highly dynamic and distributed video streams 
online, and to process and distribute the live channels to 
subscribers all over the world. 

Elastic resource provisioning and computation offloading 
are where cloud computing platforms excel [10]. We have 
seen many new generation of cloud-based multimedia services 
that emerged in recent years, e.g., Netflix, which are rapidly 
changing the operation and business models in the market. 
Facing similar scale challenges, crowdsourced live streaming 
would benefit from the cloud services, too. Yet the distributed 
and highly dynamic sources, as well as the much more 
stringent delay constraints imposed by live streaming, make 
the problem more involving, which remains to be explored 
with novel and distinct solutions. 

3 http://www.twitch.tv/ 



In this paper, using realworld measurement, we identify 
the potential benefits as well as the key challenges when 
crowdsourced video meets cloud. We present a generic frame¬ 
work for a cost-effective cloud service that provisions cloud 
resources in a fine granularity to work with geo-distributed 
video crowdsourcers. Using adaptive and collaborative leas¬ 
ing strategies, our design well accommodates the diverse 
capacities and prices of cloud instances, and addresses the 
location impact to the streaming quality. We have built a 
prototype system running over the Internet and the Amazon 
EC2/Microsoft Azure cloud, and the experimental results 
experiments demonstrate the effectiveness of our solution in 
terms of both deployment cost and streaming quality. 

The remainder of this paper proceeds as follows. Section 2 
discusses the background and related work. Section 3 presents 
an overview of the crowdsourced live streaming system, and 
analyzes its unique challenges using realworld data traces. In 
Section 4, we first investigate the inherent problem of cloud 
leasing strategy. An optimal solution is then developed to deal 
with geo-distributed crowdsourcers in Section 5. In Section 6, 
we present a prototype platform with the measurement results 
and the trace-driven simulation. Finally, Section 7 concludes 
the paper and discusses potential future directions. 

II. Background and Related Work 

In the past two decades, video streaming over the Internet 
has quickly risen to become a mainstream ’’killer” application 
[2]. For large scale distribution, many existing systems rely 
on content distribution networks (CDNs) [11] or peer-to-peer 
(P2P) [1], or hybrid solutions [8]. More recently, with the 
flexible and elastic resource provisioning, cloud computing 
has been proven to be an efficient solution toward highly 
scalable video distribution. A prominent example is Netflix, 
a major on-demand Internet video provider. Netflix migrated 
its entire infrastructure to the powerful Amazon AWS cloud 
in 2012, using EC2 for transcoding master video copies to 
50 different versions for heterogeneous end users and S3 for 
content storage [11]. In total, Netflix has over 1 petabyte 
of media data stored in Amazon’s cloud. It leases the com¬ 
putation, bandwidth and storage resources with much lower 
long-term costs than those with over-provisioned self-owned 
servers, and reacts better and faster to user demand with the 
dramatically increasing scale. There have been pioneer studies 
on migrating video services to the cloud to accommodate 
worldwide-distributed and time-varying video demands [10] 
[2]. Aggarwal et al. [13] showed that the cost of IPTV services 
can be noticeably reduced through a cloud infrastructure, and 
Wu et al. [10] utilized a geo-distributed cloud to support large 
scale social media streaming applications. Wang et al. [8] 
presented CAEMS (Cloud-Assisted Live Media Streaming) to 
lease and adjust cloud server resources in a fine granularity, 
meeting with the temporal and spatial dynamics of demands 
from online users. 

Empowered by today’s rich tools for media generation and 
collaborative production, and the convenient Internet access, 
crowdsourcing further extends the single-source paradigm. It 



Figure 1: A generic crowdsourced live streaming system over cloud 


combines the efforts of multiple self-identified contributors, 
known as crowdsourcers , for a greater result, and has seen 
success in many areas [3]. For example, LiFS (Locating in Fin¬ 
gerprint Space) was developed for wireless indoor localization 
with smartphones based crowdsourcing [4]. Ou et al. [5] used 
crowdsourcing approach to optimize mobile devices’ energy 
efficiency by utilizing signal strength traces shared by other 
devices in cellular networks. For video applications, a scalable 
system that allows users to perform content-based searches on 
continuous collection of crowdsourced video was proposed 
in [7]. Biel et al. [6] investigated the the crowdsourcing of 
personal and social traits in online social video or social 
media content in general. Recently, Youtube has integrated 
with Google Moderator, a crowdsourcing and feedback pro¬ 
duction, to increase the engagement between viewers and 
content creators. Such other video sharing sites as Poptent and 
VeedMe have also opened interfaces for crowdsourcers with 
user generated content. Crowdsourced live streaming services 
have emerged in the market as well, especially for streaming 
sports online broadcast. Examples include Stream2Watch.me 
and sportLEMON.tv. 

Our study is motivated by these pioneer works. Yet crowd¬ 
sourced live streaming demands efficient content collec¬ 
tion, processing, and distribution with stringent delay con¬ 
straints, which remain to be explored. This paper highlights 
these unique challenges, particularly when crowdsourced live 
streaming meets cloud, and presents our initial attempts toward 
addressing these challenges. 

III. Crowdsourced Living Streaming: System 
Overview and Challenges 

We illustrate a generic crowdsourced live streaming system 
with geo-distributed crowdsourcers and viewers in Fig. 1. 
A set of crowdsourcers (or sourcers in short) upload their 
individual video contents in realtime, which, through a video 
production engine, collectively produce a single video stream. 
The stream is then lively distributed to viewers of interest. 
Both the sourcers and viewers can be heterogenous, in terms 
of their network bandwidth, and their hardware/software con¬ 
figurations for video capture and playback. As such, realtime 
transcoding is necessary during both uploading and download¬ 
ing, so as to unify the diverse video bitrates/formats from 



Figure 2: Number of viewers and source streams variation Figure 3: Source stream distribution in one day 
in one week 


Figure 4: Viewer demand for the distributed source 
streams in one day 


Table I: Top 5 sourcers from Twitch.tv on July, 12th 


Sourcers ID 

Time (Pacific Time) 

Location 

riotgames 

11:10 AM-15:40 PM 

Cologne, Germany 

dota2ti ru 

7:10 AM-18:10 PM 

Seattle, USA 

srkevol 

6:00 AM-23:40 PM 

Las Vegas, USA 

riotgamesturkish 

1:30 AM-7:10 AM 

Istanbul, Turkey 

ongamenet 

3:00 AM-13:30 PM 
18:20 PM-22:40 PM 

Seoul, South Korea 


different sourcers for content production, and to replicate the 
output video stream to serve the heterogeneous viewers, possi¬ 
bly through through a CDN with such adaptation mechanisms 
as DASH (Dynamic Adaptive Streaming over HTTP) [12]. 

This generic architecture reflects that of state-of-the-art 
real world systems. For example, NBC’s video content from the 
41 feeds in Sochi Winter Olympics were encoded by Windows 
Azure Media Services to the 1080P format, and dynamically 
transcoded into HLS and HDS formats. These streams were 
then pulled from Azure to the Akamai’s CDN and distributed 
to audiences on targeted devices, resulting in over 3000 hours 
of live Olympics streaming contents. 

Given the large system scale and the high bandwidth, stor¬ 
age, and computation demands involved, cloud services with 
elastic resource provisioning is expected. We again consider 
a generic geo-distributed cloud infrastructure, which consists 
of multiple cloud sites distributed in different geographical 
locations (e.g., US East (N. Virginia) and EU (Ireland) in 
Amazon EC2 Cloud) [10]. Each cloud site resides in a data 
center, and contains a collection of interconnected and virtu¬ 
alized servers. The server resources will be provisioned for 
crowdsourced live streaming, e.g., computation resources for 
collective production and transcoding. 

Optimization for conventional single-source video stream¬ 
ing is generally viewer-driven', the resource provisioning de¬ 
pends on the distribution of the viewers. In crowdsourced 
video, however, the sourcers themselves come from all over the 
world, whose distribution must be as well taken into account 
during resource provisioning. This is further aggravated given 
that the collaborative production escalates the demands on 
both bandwidth and computation. The crowdsourced streaming 
workflow is also much more dynamic, as individual sourcers 
can start/terminate based on their own schedules. 

To better understand the inherent challenges of deploying 
such a system, we have crawled one-week trace from July 


6 to July 12, 2014 in Twitch.tv website, which has 14 geo- 
distributed ingest servers, 1 from Asia area (AS for short), 
6 from European area ( EU for short), and 7 from United 
States area (US for short) to broadcast live game streams to 
viewers in a global scale. For simplicity, we consider that 
one live stream is contributed by only one sourcer. Fig. 2 
shows the number variation of viewers and streams in a week, 
from July 6 to July 12, 2014. First, it is obvious that the 
number of viewer is highly dynamic, which is prevalent in 
current large scale systems [2]. Due to the differences in time 
zones and languages, the distribution of viewers can be time- 
varying, which has been discussed in previous works [9] [8]. 
Similar to the number of viewers, we can see that the number 
of source streams also has great time variations in one-day 
time, from about 5000 streams in the early morning to almost 
12000 streams in the afternoon. To further investigate the time- 
varying distribution of the source streams, we have measured 
the top 15 streams with the highest viewer population from 
3:00 AM to 24:00 PM (PST) on July 12, 2014, and list the 
five most popular streams in Table 1. We can see that not only 
the time periods but also the locations of the stream sourcers 
are highly dynamic. In Fig. 3, we divide the locations as AS, 
EU, and US, and record the percentage of source streams from 
each region for every 30 minutes between 3:00 AM to 24:00 
PM. It can be easily observed that most of the streams from 
Asia and Europe are during the morning and afternoon, and 
the number of streams from the United States keeps growing 
when night falls. We further measure the viewer population for 
the distributed source streams from each region in Fig. 4. We 
can see that in the early morning between 3:00 AM and 7:00 
AM, most of the popular streams come from Europe or Asia. 
We conjecture that it is because the local times in Europe or 
Asia are in afternoon or evening, and there are more online 
sourcers from these regions during that time. Meanwhile, the 
viewer demand from these areas can also be more active during 
this period. And most of the viewers may prefer the streams 
with native language speaking sourcers. Similar reasons can 
also explain the increase of viewer demand for the source 
streams from the United States after 15:00 PM. 

In summary, in a crowdsourced live streaming system, 
both the number and the distribution of the crowdsourcers 
can be highly dynamic. Together with time-varying viewer 
demand, the conventional server allocation design faces more 







































































































challenging in a large scale. We will utilize the cloud service 
to coordinate the crowdsourcers and viewers. The cloud server 
instances (e.g. EC2 in Amazon Cloud) are provisioned to 
collect and process the live feeds of the crowdsourcers, and 
the cloud CDNs (e.g. CloudFront in Amazon Cloud) are 
deployed to handle the viewer dynamics. Through dynamic 
cloud leasing, we will present a cost-effective solution with 
streaming quality guarantee. 

IV. Cloud-Assistance for Crowdsourced Live 
Streaming 

In this section, we first model the global cloud service 
leasing strategy with quality guarantee, and transform it into an 
equivalent problem in a directed graph. We will then present 
an optimal algorithm and an efficient online heuristic solution 
based on the equivalent problem. 

A. Problem Formulation 

We use A to denote the global areas, which can be divided 
into n different regions as A = {Ai, A 2 ,..., A n }. Assume 
that there are m cloud sites all over the world, represented 
as § = {si, S 2 ? •••> %}• As most cloud providers have a 
minimum unit time for the duration of leasing a server (e.g. 
1 hour for Amazon EC2), we use T to denote this duration. 
We define a time slice as an integer multiple k (k £ N + ) of T 
and at the beginning of each time slice kT , our cloud leasing 
strategy makes decisions on whether to provision or terminate 
the cloud servers in the distributed regions. We assume that the 
schedules of crowdsourced streams are predictable and can be 
known beforehand, where the rationale is of two folds. First, 
in practice a large portion of crowdsourced streams are driven 
by well-scheduled events (e.g. as one of the top 5 sourcers 
from Twitch.tv in Table I, the channel of srkevol has a strict 
schedule about the Evolution 2014 Tournament 4 ). Moreover, 
many self-motivated crowdsourcers prefer a regular broadcast 
schedule everyday to attract more viewers. We can accordingly 
forecast the numbers and distributions of both crowdsourcers 
and viewers for the next time slice, e.g., using techniques from 
[8][14]. 

For a given time t, we denote the set of source streams from 
the crowdsourcers as L(£). According to the location distribu¬ 
tion of crowdsourcers, we can specify the set as L A (t) = 
{Ia 1 (t), Ia 2 (4 - j lA n (£)} for the n different regions, respec¬ 
tively. As all these live streams are served by the provisioned 
cloud instances, we further consider the set according to the 
dedicated cloud sites as L s (£) = {l Sl (t), l S2 (t ),..., / Sm (t)}, 
where l Sj (t) represents the live streaming sources loaded in 
cloud site Sj. For example, if l Sj (t) = 0 , no crowdsourced 
stream is served by cloud site Sj, i.e., cloud site Sj does not 
need to be leased at time t. Otherwise, if the live streams 
from area A 2 , A 3 , and A 5 are served by cloud site Sj, we 
have l Sj (t) = l A2 (t) U l A3 (t) U Ia 5 (t). 

We denote the server provisioning cost at time t as C p (t) = 
tfihj (t))> where (?■ is the price of the leased instances 

4 http: // evo2014. s 3. amazonaws. com/brackets/index .html 


in cloud site Sj . We assume that there is always a bootstrapping 
server so redirecting the global live sources to the distributed 
streaming servers with the cost c$. To offload the bandwidth 
support for the diverse viewer demands from the cloud servers, 
a globalized CDN strategy (e.g., CloudFront in Amazon) is 
deployed to distribute the live streams all over the world. 
The cost of out-bound traffic from the cloud servers to the 
CDN can be calculated by the number of channels loaded 
in the cloud servers, and denoted as C b = Yl'jLi c $(^- W)- 
As the cost of the bandwidth support from the CDN to 
the global viewers is proportional to the viewer demands 
D(t) = where Di A \t) represents the viewer 

demands for the crowdsourced streams from region A$, we 
can denote the total cost of the CDN as C d = c d (D{t)) with 
c d as the cost to support one unit of the viewer demand. The 
total cost of the crowdsourced live streaming system can thus 
be calculated as follows: 

Cost total = c 0 4 c p I C b I C d 

rn 

= co + £ [cfjilsM + 4(M*))] + A^)) 

3 = 1 
rn 

= c 0 + (f)) +c d (D(t)) 

3 = 1 

v -V-' 

C osti ease 

where Cj(-) can be determined by the price policy of instance 
leasing and data traffic in cloud site Sj. As the first and last 
costs on the right side of the equation can not be reduced, we 
focus on minimizing the middle part of the total cost, i.e., the 
cloud leasing cost, which we denote as Costi ease . 

We assume that the live crowdsourcers in each region l a i (t) 
have a preference value on a given cloud site Sj, which we 
denote as P(/^ i (f), Sj). Generally, the preference value can 
be quantified according to the RTT, jitter or packet loss values 
of the connections between the crowdsourcers and the given 
cloud site, such as defined as a concave decreasing function of 
the estimated latency or a concave increasing function of the 
estimated connection speed in a geo-distributed service [9]. To 
guarantee the streaming quality of the crowdsourced streams 
in region A*, we only consider allocating these streams to the 
cloud sites with the top k preference values, and define the set 
of these cloud sites as Indexf^Ai (t), k) for the crowdsourced 
streams IaX ^)• As a real world example, Twitch/Justin.tv 
provides an ingest server ranker program to feedback the list 
with top 3 servers for each crowdsourcer. 

The cloud service leasing problem in our geo-distributed 
crowdsourced live streaming system can thus be formulated 
as to find a cloud site leasing strategy L s , subjecting to the 
following constraints: 

(1) Cloud site service constraint: 

\/Ai G A, 3l Sj G L s , lAi Q l Sj 
Vl sj , l s .. e L s , if j ± j, i Sj n l s -. = 0 

(2) Crowdsourcer preference constraint: 






Figure 5: An illustrative example of (a) distribution graph; (b) service migration vectors Figure 6: An illustrative example of (a) a constructed service migration graph; (b) 

migrated cloud service for geo-distributed crowdsourcers 


\/Ai G A, Sj G §, if l A, Q l Sj 
Sj G Index(lAi{t),k ) 

(3) Total budget constraint: 


Costi ease H - cq C ^ Cost 


max 


The cloud site service constraint states that the crowd- 
sourced live streams in a given region are served by only 
one cloud site. The preference constraint guarantees that the 
crowdsourced live streams in each region are collected by one 
of the cloud sites with the corresponding top k preference 
values. The total budget constraint demands that the total cost 
including the bootstrapping server, the provisioned cloud sites 
and the CDN utilization must not exceed the total budget 
Cost Our objective is to maximize the global relative 
preference of the crowdsourcers, which is defined as: 

E \Di Ai (t)\-PiUMsj) 

VsjGS, Ia^ —^sj 

P9l0bal = E \D Ui (t)\P(l Ai (t)Jmlex(l At m)) 

VAiE A 

where for ease of exposition, we also use /ndex(/^. (£), 1 ) to 
denote the top 1 preferred cloud site for the live crowdsourced 
streams (t). We use | Di a . (t) | to represent the size of viewer 
demands for crowdsourced streams Z^.(t), and Pglobal is thus 
a relative ratio ranged between ( 0 , 1 ] in the global scale. 

To make our solution cost-effective, we also need a second 
objective, i.e., to minimize the cloud leasing cost Costi ease . 
It is easy to see that these two objectives (i.e., Pgiobai and 
Costi ease ) may contradict with each other, since always 
leasing the top preferred cloud server can increase the leasing 
cost. Therefore, we adopt the following linear combination 
form to align them together by different weights: 


V ’ Costiease 
COStmax Co C^ 


+ q ' (1 - Pglobal) 


where p and q are two parameters that can assign different 
weights to the two goals. As Pgiobai is a relative ratio of the 
preference values of all the crowdsourcers in the system (i.e. 
if Pgiobai = all the crowdsourced live streams are allocated 
in their most preferred cloud sites), (1 — Pgiobai) should be 
minimized as Costi eaS e • To make the leasing cost part also 
be a ratio ranged between ( 0 , 1 ], we further divide Costi ea se 
by (Costmax ~ Co — C d ) and then use parameters p and q to 
linearly combine the two parts together. In the next subsection, 


we will transform this problem to an equivalent graph problem 
and then propose an optimal solution. 

B. Equivalent Problem 

For ease of exposition, we assume the given time is t 
for the remainder of this section and thus omit (t) in all 
such notations as l a i(t), Z s .(t), Di A \t), etc. Given the geo- 
distributed crowdsourcers and cloud sites, we can construct 
a distribution graph. Fig. 5(a) shows an example of 5 cloud 
sites and global crowdsourcers located in 6 regions. There 
are two types of vertices in the distribution graph, namely, 
the live crowdsourcers (e.g. Iau—Ja 6 in Fig- 5(a)), which 
are represented by circles, and the cloud sites (e.g. si,..., 85 ), 
which are represented by squares. Initially, all the live source 
steams are attached to their most preferred cloud sites and we 
denote the corresponding leasing cost as 

COSti n itial = ^ ^ Cj i^Ai ) 

\/AiE A, Sj=Index{lA i ,1) 

According to the price strategy Cj(-) of different cloud site 
Sj, we have the direction edges between these distributed cloud 
sties. We use d(i,j) to denote a direction edge from the cloud 
site i with higher price to the cloud site j with lower price (e.g. 
in Fig. 5 (a), d( 4,2) means that C 2 (x) < c^x) for the same 
crowdsourcer x), which indicates that the service is migrating 
towards a more cost-effective solution. 

With the distribution graph and direction edges, we then 
generate service migration vectors to indicate the available 
cloud sites for more cost-effective service migration. We use 
to denote a service migration vector that represents 
the live crowdsourcers Z^. are migrated and served by the 
cloud site Sj, rather than the cloud site Index(lA i: 1)- For 
example, in Fig. 5(b), the cloud site 84 is preferred by the live 
crowdsourcers Ia 5 and Ia 6 , i.e., 84 = Index(lA i: 1) for i G 
{5, 6 }. According to the direction edges d( 4,2) and d(4, 5), 
we can have the service migration vectors rh( 5, 2 ) and m(5, 5) 
for the live crowdsourcers Z^ 5 , and m( 6 , 2 ) and m( 6 , 5) for 
the live crowdsourcers Ia 6 ■ Define M as the set of all service 
migration vectors that are generated from the given distribution 
graph. For each service migration vector rh(i,j) £ M, the 
relative preference degradation for live crowdsources Z^. to 
be served by the cloud site j can be calculated as follows: 

C = Index(l Ai ,l)) - P(l Ai ,Sj)) 

e8 ^,J)- E \D lAi \P{l Ai ,Index{l Ai A)) 

VAiEA 












Also, for each m(i,j) 9 we have the cost saving as follows: 


Save(i,j ) = c-(l Ai ) - Cj(l A J 

where Cj is the pricing policy of cloud site Sj = Index (l ^, 1). 

Traversing all the service migration vectors m(i, j) G M, 
we can have a service migration graph G(V,E). Fig. 6 (a) 
shows an example of Fig. 5(b). We connect the cloud sites 
with at least one service migration vector through migration 
direction edges. Note that there may be more than one 
migration direction edges leaving from the same cloud sites. 
For example, in Fig. 5(b) there are two migration direction 
edges d( 4, 2) and d(4, 5) leaving from cloud site 54 . Since 
the set of service migration vectors M has already been 
generated from the migration direction edges, we can put 
any one of these directed edges into the constructed service 
migration graph (which is only for the connectivity purpose 
that will be further explained in the next subsection). Finally, 
we connect the crowdsourcers to the cloud sites by the service 
migration vectors. In the constructed service migration graph 
G(V,E ), we can further define the optimal service migration 
(OSM) problem as to find a set of migration vectors O C M, 
subjecting to the following constraints: 

(1) Service migration vector constraint: 


v m(i,j),m(i,j ) € M and j ± j, 
if m(i,j ) e O, then m(i,j) ^ O 

(2) Preference degradation constraint: 

G O, Sj G Index^A^k) 

(3) Cost saving constraint: 


Costiniuai- ^ Save(i, j) + c 0 + C d < Cost max 

\/rh(i,j)EO 

The service migration vector constraint represents that there is 
at most one migration vector leaving from a live crowdsourcer 
vertex, which corresponds to the cloud site service constraint 
in the cloud leasing problem. The preference degradation con¬ 
straint is related to the crowdsourcer preference constraint of 
the cloud leasing problem. The cost saving constraint refers to 
the total cost not exceeding Costmax in the original problem. 
Our objective is to minimize the linear combination of cost 
saving and the relative preference degradation as follows: 


Lease r 1 


(C OStinitial E Save(i,j )) 

Vra(z,j)EO 


+q( i-(i- J2 De 9(hj))) 

Vrh(i,j)EO 


P • Costinitial 

Lease ma x 


E (q- De 9{iij) 

Vrh(i,j)EO 


P- Savejh j) 

Lease max 


) 


where Leasem ax — G ostmax Q) G . As Gostinitial cannot 
be further reduced, our objective can thus be simplified as to 
minimize 


E ( q-Deg(i,j ) 

Vrh(i,j)£0 


p • Save(i,j) 
Leasemax 


The OSM problem in graph G(V,E) can be naturally related 
to the cloud site leasing problem: the optimal solution O 
indicates the service allocation for the crowdsourcers in each 
region toward the distributed cloud sites. Fig. 6 (b) shows 
an example with O = {m(l, 4), m(3,3), m(5, 2 ), m( 6 , 5)}. 
Therefore, we have the set of live crowdsourcers served 
in each cloud site as follows: l Sl = 0 , l S2 = Ia 2 [J^a 5 , 
lS 3 l A 3 \JlA 4 , ^£>4 Iai ? and l S5 ^Aq- 


V. Optimal Cloud Leasing Strategy 

The optimal solution of the equivalent problem can be com¬ 
puted according to the spanning trees in the service migration 
graph. Clearly, a spanning tree is a subgraph of the directed 
graph G(V,E). Let T denote the number of spanning trees in a 
service migration graph G(V,E). We define the set of service 
migration vectors in the i-th spanning tree (i G {1, ...,T}) as 
Mi, and the optimal solution of Mi as C^. We then have the 
following theorem: 

Theorem 1. There must exist an optimal solution O of the 
service migration vectors M in the service migration graph 
G(V, E), such that O G {Oi,O t}- 

We can prove this using contradiction by assuming that 
there exits an optimal solution set of the service migration 
vectors O with edges in a circle. Then there are two scenarios 
if the edges in directed graph contain a circle: (1) The directed 
edges are sequenced in a line one after another, with the end 
vertex sending toward the head vertex. (2) There is more than 
one directed edge leaving from the same vertex. As there is 
no edge sending toward to the live crowdsourcers vertex in 
directed graph G(V,E), there would be cloud sites sequenced 
in a circle, and we have the confliction Cl(0 > C 2 (l) > 
... > c end (l) > C\ (/). Also we can eliminate the scenario 2 
according to the definition of service migration graph. Due to 
space limitation, here we omit the details of the proof, which 
can be found in our technical report presented. 

According to Theorem 1, each spanning tree can provide a 
local optimal solution, and the global optimal solution can be 
achieved by exploring all the spanning trees in G(V, E). There 
are extensive studies on enumerating all the spanning trees 
in a directed graph [15] [16]. E.g., a well-known algorithm 
in [15] uses backtracking and a method for detecting bridges 
based on the depth-first search with the time complexity 
0(|F| + |£j + |£j • \T\) and the space complexity 0(|Vj + |Ej). 
For a spanning tree i in the service migration graph G(V,E), 
the service migration vectors Mi (and each of its subsets) 
are feasible solutions under the service migration vector con¬ 
straint. By enforcing the preference degradation constraint, 
a number of spanning trees can be further screened out. 
Thus, for a remained spanning tree i, we need to calculate 




Algorithm 1: Optimal service migrationQ 


1 0 = 0 

2 for each enumerated spanning tree A on G ( V, E) do 

3 if tree A fulfils the preference degradation constraint then 

4 if £ Save(i, j) > Sav ernin then 

rh(i,_j)eM x 

5 0=F(M A , £ Save(i,j) - Save miri ) 

rn(i,j)EM x 

6 °\ = M \ ~ ° 

1 if objective(0 x ) < objective(O) then 

8 I ° = °X 

9 end 

10 end 

11 end 

12 end 

13 return O as the global optimal solution for G ( V, E) 


the local optimal migration vector set Oi to minimize the 
combinational objective with the cost saving constraint, which 
can be solved by the classic 0-1 knapsack problem. In partic¬ 
ular, let F (ItemS et^TotalW eight) denote the standard 0-1 
knapsack problem. The ItemSet is Mi in our problem and the 
TotalWeight is equal to ( Save(i,j ) — Save m i n ), 

rh(i,j)eM x 

where Savemin — Costinmai H- Co ~\~ C Costmax • 

We thus need to select a set of items M (service migra¬ 
tion vectors) in the ItemS et (Mi) with the total weight 

^ Save(i,j) < ^ Save(i,j) — Savemin so 

\/rh(i,j)EM rh(i,j)£Mx 

as to maximize the total value 


12 ( Q-Deg{i,j ) 

Vra(3,j)EM 


p- Save(i,j) 
I / ecise rnax 


From the optimal solution O of F, we can thus calculate 
the optimal solution Oi of Mi on the spanning tree i as 
Oi = Mi — O. Then the global optimal solution can be found 
through enumerating all the spanning trees on the service 
migration graph G(V,E). We summarize this optimal solution 
in Algorithm 1. 

It is worth noting that finding the optimal solution for the 
standard 0-1 knapsack problem can become a time-consuming 
task as the crwodsourcers are distributed in a large scale, which 
can cause the optimal solution proposed in Algorithm 1 less 
suitable in practice, especially for an online system with highly 
dynamic crowdsourcer distribution and viewer demand. To this 
end, we further propose a simplified heuristic algorithm in 
Algorithm 2, which can work efficiently and still return the 
global optimal solution under certain situations. We then have 
the following theorem: 

Theorem 2. Algorithm 2 can return the global optimal 
solution when Cost initial + Co + C d < Cost max for each 
enumerated spanning tree. 

Note that, if we can prove that the local optimal solution 
in each spanning tree can be achieved by Algorithm 2 when 
Cost initial + Co + C d < Cost max , we can then prove 
that Algorithm 2 can return the global optimal solution by 
Theorem 1. We can prove this using contradiction by assuming 
that there is a spanning tree A with Costinmai + Co + C d < 
Costmax but has an optimal solution 0\ C M \, which 
is better than the solution 0\ found by Algorithm 2. As 


Algorithm 2: Efficient online service migration() 

1 0 = 0 

2 for each enumerated spanning tree A on G(V, £) do 

3 if tree A fulfils the preference degradation constraint then 

4 O x = 0 

5 T otal save = 0 

6 sort rh(i, j) G M x with in increasing order 

7 for rh(i, j) € M x do 

8 if (q ■ Deg(i, j ) < T p - ■ Save(i, j )) or 

i^easemax 
(Total save < Save rn ^ n ) then 

9 | put rh(i, j ) into O x Total save = Total save + Save(i, j ) 

10 end 

11 end 

12 if objective(0 x ) < objective(O) then 

13 I ° = °X 

14 end 

15 end 

16 end 

17 return O as the online solution for graph G ( V, E) 

Savemin = Cost initial + c 0 + C d - Cost max < 0, we always 
have Total S ave ^ Save m in • Thus, for all m(i, j) G 0\ , we 

have q • Deg(i,j) < L ea se -’ Save(i,j). The contradiction 

can thus be achieved by first identifying the difference between 
Ox and 0 \, and then showing that making changes to 0\ 
according to 0\ can further improve 0\. Due to space 
limitation, here we omit the details of the proof, which can be 
found in our technical report presented. 

VI. Performance Evaluation 

We have implemented the crowdsourced live streaming 
system as a prototype based on PlanetLab, Amazon Cloud, 
Microsoft Azure Cloud, and the opensource VLC/VLM coder, 
and have conducted realworld experiments to understand its 
performance. We have also performed trace-driven simulations 
to further evaluate the system performance in large scale. 

A. Prototype experimental results 

In our prototype implementation, both the live crowd- 
sourcers and end users are deployed in 398 Planetlab nodes, 
which are set up with VLC media player 0.8.7 Janus 
on each node. We deploy the federation of cloud service 
from Microsoft Azure Cloud and Amazon Cloud in our 
prototype platform. These two cloud service providers can 
offer totally 21 cloud sites distributed all over the world. 
In each cloud site, the General Purpose instances 
are provisioned with Medium (A2) from Microsoft Azure 
Cloud and m3.medium from Amazon Cloud. Each provi¬ 
sioned instance is set up with Ubuntu 14.04 LTS and 
installed with VLM to manage multiple live streaming chan¬ 
nels. Further, we deploy the CloudFront CDN service in All 
Edge Locations for the globalized content delivery to the 
geo-distributed viewers. In order to evaluate the streaming 
quality, the live feeds are generated through videos uploaded 
from the distributed Planetlab nodes. We use a series of test 
videos with different resolutions and bitrates 5 . Each dedicated 
sourcer stores one of these videos as its own live feed. We 
deploy 18 cloud sites in different regions from Amazon Cloud 
and Microsoft Azure, 9 from America area, 3 from Europe 
area, and 6 from Asia Pacific, respectively. To explore the 
distribution of the 398 planetlab nodes, we measure the RTT 
latency between the nodes and the cloud sites, and use the 
cloud site with the minimal latency to approximate their 

5 http://www.cs.sfu.ca/ rsj jcliu/infocoml 5/crowdsourcing/videos 
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Figure 7: RTT latency 
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Figure 8: Different regions 
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Figure 9: Videos with different bitrates 


Table II: Three cloud leasing strategies for crowdsourced live streaming from 7 areas 



Van (10) | CA (19) 

VA (20) 

SA (5) 

K. and J. (20) 

CHN (16) 

S. and A. (4) 

Cost 

TOP preferred first strategy 

m3 X 3 (Oregon) 

m3 X 2 (Virginia) 

ml X 1 (Sao Paulo) 

m3 X 2 (Tokyo) 

m3 X 1+ml X 1 (Singapore) 

ml X 1 (Sydney) 

$5,584 per Hour 

Centralized provisioning strategy 

m3 X 5 (Virginia) 

m3 X 4 (Singapore) 

$4.77 per Hour 

Optimal migration 

m3 X 3 (Oregon) 

m3 X 2 + ml 

[ X 1 (Virginia) 

m3 X 2 (Tokyo) 

m3 X 2 (Singapore) 

$5,118 per Hour 


locations. In Fig. 7, we present the nodes population and 
the average RTT latency from their top 1 preferred cloud 
sites. With the latency results, each sourcer can construct a 
preference list of the cloud sites. In order to measure the delay, 
we implement a live streaming of a timer video 6 from the 
planetlab node to the cloud server. We also use ffmpeg to 
measure the frame loss ratio during the live streaming through 
recording the number of duplicated frames (i.e. because the 
current frame is not received by the playback deadline, the 
former frame is duplicated) and dropped frames (i.e. the frame 
is received but corrupted). These planetlab nodes are divided 
into groups according to the RTT latency in Fig. 7. We present 
the streaming delay in different areas in Fig. 8, and frame loss 
ratio with different videos in Fig. 9. Generally, we can see the 
streaming delay increase more than 80% if the latency is above 
20ms. On the other hand, the frame loss ratio is stable when 
the latency is under 200ms. 

We will further investigate the server provisioning cost 
and the video streaming quality of the cloud-based strategies 
through the implementation on the prototype platform. Besides 
our proposed optimal migration (OM) strategy, two 
other cloud-based strategies are implemented for comparison. 
The top preferred first (TOP) strategy deploys all 
the available cloud sites to allocate the service for sourcers in 
their most preferred cloud site. Meanwhile, in centralized 
provisioning (CP) strategy the cloud servers are allocated 
in the regions with the most sourcers. Here we select Virginia 
and Singapore as the central regions, and consider CP as the 
benchmark strategy. The implementation details of the cloud 
leasing strategy are presented in Tab. II. For example, m3 x 
1+ ml x 1 (Singapore) means one m3.xlarge instance and one 
ml.large instance are provisioned in Singapore region to serve 
16 sourcers. We also calculate the server provisioning cost per 
hour according to the prices of Amazon EC2. CloudFront is 
deployed as CDN for the global distribution, and we record the 
average frame loss ratio from 20 distributed users. Generally 
the frame loss ratios can be reduced by about 10% for TOP 
and OM strategies. Especially, for the plantlab nodes in China, 
the improvement can reach almost 30% with the proposed 

6 http://www.cs.sfu.ca/~jcliu/infocoml5/crowdsourcing/timer.mkv 


strategy. Comparing with TOP strategy, our proposed solution 
saves 8.34% cost, and improves 9.1% video quality on average. 

B. Trace-driven simulation results 

To further evaluate the performance of the proposed strategy 
in larger scale, we simulate the system with the real world 
trace data from Twitch.tv and the measurement results from the 
prototype system. The diverse prices of distributed cloud sites 
are referred to Amazon Cloud and Microsoft Azure Cloud. 
We consider a conventional centralized dedicated 
server (CDS) strategy as the benchmark, in which the 
single server is allocated in the central region to service the 
global requests. The price cost should cover the peak user 
demand, and we will take this cost as the budget constraint 
in our proposed OM strategy. We also set p/q = 0.1 and 
the preference value is inversely proportional to the RTT 
latency. Another two cloud based strategies are deployed for 
comparison. All these cloud-based strategies can scale their 
provisioning capacity adaptively to the user demand. 

Fig. 11 shows the streaming delay reduction of the three 
cloud-based strategies comparing with the benchmark CDS 
strategy. Generally, TOP and OM strategies, which deploy the 
geo-distributed cloud service, can reduce almost 50% stream¬ 
ing delay of the benchmark strategy. The CP strategy can have 
an improvement only when most of viewers concentrate on 
several sourcers from the same region (e.g. 3:00AM-8:00AM 
in Asia and 13:00PM-16:00PM in Europe). Different from 
the streaming delay reduction, the frame loss reduction is 
more dynamic with time variations in Fig. 12. Before 8:00 
AM, most of popular sourcers are from Europe and Asia, the 
CDS strategy would suffer from the long transmission, despite 
the total number of streams is not large, and there is still 
extra available bandwidth capacity for the rented server. After 
9:00AM, sourcers from north America attract more viewer 
demand. Then dedicated server can provide an acceptable 
service with less frame loss ratio. In Fig. 13, we present 
the cost ratio between the three cloud-based strategies and 
the benchmark strategy. As the server instances are allocated 
in the distributed cloud sites with diverse prices, the Topi 
strategy can lead to a higher cost when the peak demand 
comes. Because of the budget constraint, the provisioning 












































































Figure 10: Implementation results 





Figure 13: Reduction of provisioning cost 


cost in our proposed strategy is limited under the cost of the 
benchmark. Yet, comparing with the TOP strategy, the gap of 
streaming delay and frame loss ratio can still be kept within 
5%, and almost 30% of the provisioning cost is saved through 
the service migration during peak demand. 

VII. Conclusion and Future Work 

In this paper, we explored the emerging crowdsourced live 
streaming systems, in which both the number and distribution 
of the crowdsourcers can be highly dynamic. It further mo¬ 
tivated the design of cloud leasing strategy to optimize the 
cloud site allocation for geo-distributed live crowdsourcers. A 
prototype of crowdsourced live streaming platform was built 
with Amazon Cloud/Microsoft Azure and Planetlab nodes. The 
performance of the proposed strategy was evaluated through 
extensive experiments. 

Our work is an initial study, and there are still many 
open issues to be further explored. We plan to continue 
enhancing our design by conducting more evaluations on our 
prototype with larger scale experiments. Our ongoing work 
includes tailoring our method for some specific crowdsourced 
live streaming applications, such as synchronizing multiple 
collaborative crowdsourced live videos for 3D immersive 
environment reconstruction or real-time interaction. We are 
also interested in extending our current deployment strategy 
to a more general scenario, in which the distributed server 
instances can cooperate with CDNs for a larger service cover¬ 
age with a lower cost. In addition, we believe that the dynamic 
geo-distributed crowdsourcers are predictable, in which there 
are two major types of live sources, namely, scheduled sources 
and non-scheduled sources. The scheduled sources mean the 
crowdsourcers follow some social event during a certain time, 
such as a presidential election, or a football match, which 
is easy to predict. As to the non-scheduled sources, the 
crowdsourcers can start their live streaming arbitrarily. The 
time-varying live sources usually relate to the dynamic viewers 
demand, since the crowdsourcers are motivated to get more 
subscribers as a reward. They tend to broadcast in a fixed time 
every day, or choose a period when a peak number of viewers 
can be achieved. This behavior of crowdsourcers is evident in 
some modern crowdsourced live streaming platform, such as 
Twitch.tv. Our solution could be enhanced with crowdsourcer 
prediction through user behavior analysis from real-world 
measurement results. 
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